{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "e5545a38-44a7-4aca-be6d-a66c51c75ec8",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "# Product Recommendation with Feathr on Azure (Advanced)\n",
    "\n",
    "This notebook illustrates the use of Feathr Feature Store to create a model that predict users' rating for different products for a e-commerce website.\n",
    "\n",
    "### Model Problem Statement\n",
    "The e-commerce website has collected past user ratings for various products. The website also collected data about user and product, like user age, product category etc. Now we want to predict users' product rating for new product so that we can recommend the new product to users that give a high rating for those products.\n",
    "\n",
    "### Feature Creation Illustration\n",
    "In this example, our observation data has compound entity key where a record is uniquely identified by `user_id` and `product_id`. With that, we can think about three types of features:\n",
    "1. **User features** that are different for different users but are the same for different products. For example, user age is different for different users but it's product-agnostic.\n",
    "2. **Product features** that are different for different products but are the same for all the users.\n",
    "3. **User-to-product** features that are different for different users AND different products. For example, a feature to represent if the user has bought this product before or not.\n",
    "\n",
    "In this example, we will focus on the first two types of features. After we train a model based on those features, we predict the product ratings that users will give for the products.\n",
    "\n",
    "The feature creation flow is as below:\n",
    "![Feature Flow](https://github.com/feathr-ai/feathr/blob/main/docs/images/product_recommendation_advanced.jpg?raw=true)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "52b7d651-19d4-44b0-a7a8-03549f49e524",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "## 1. Prerequisites\n",
    "\n",
    "### Use Azure Resource Manager (ARM) to Provision Azure Resources for Feathr\n",
    "\n",
    "First step is to provision required cloud resources if you want to use Feathr. Feathr provides a python based client to interact with cloud resources.\n",
    "\n",
    "Please follow the steps [here](https://feathr-ai.github.io/feathr/how-to-guides/azure-deployment-arm.html) to provision required cloud resources. Due to the complexity of the possible cloud environment, it is almost impossible to create a script that works for all the use cases. Because of this, [azure_resource_provision.sh](https://github.com/feathr-ai/feathr/blob/main/docs/how-to-guides/azure_resource_provision.sh) is a full end to end command line to create all the required resources, and you can tailor the script as needed, while [the companion documentation](https://feathr-ai.github.io/feathr/how-to-guides/azure-deployment-cli.html) can be used as a complete guide for using that shell script.\n",
    "\n",
    "If you already have an existing resource group and only want to install few resources manually you can refer to the cli documentation [here](https://feathr-ai.github.io/feathr/how-to-guides/azure-deployment-cli.html). It provides CLI commands to install the needed resources.\n",
    "\n",
    "> Please Note: CLI documentation is for advance users since there are lot of configurations and role assignment that would have to be done manually. Therefore, ARM template is the preferred way to deploy.\n",
    "\n",
    "The below architecture diagram represents how different resources interact with each other.\n",
    "![Architecture](https://github.com/feathr-ai/feathr/blob/main/docs/images/architecture.png?raw=true)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set the required permissions\n",
    "\n",
    "Before you proceed further, you would need additional permissions to:\n",
    "* Access the keyvault,\n",
    "* Access the Storage Blob as a Contributor, and\n",
    "* Submit jobs to Synapse cluster.\n",
    "\n",
    "Run the following commands in the [Cloud Shell](https://shell.azure.com) before moving forward. Please replace `YOUR_RESOURCE_PREFIX` with the value you used in ARM template deployment.\n",
    "\n",
    "```\n",
    "    resource_prefix=\"YOUR_RESOURCE_PREFIX\"\n",
    "    synapse_workspace_name=\"${resource_prefix}syws\"\n",
    "    keyvault_name=\"${resource_prefix}kv\"\n",
    "    objectId=$(az ad signed-in-user show --query id -o tsv)\n",
    "    az keyvault update --name $keyvault_name --enable-rbac-authorization false\n",
    "    az keyvault set-policy -n $keyvault_name --secret-permissions get list --object-id $objectId\n",
    "    az role assignment create --assignee $userId --role \"Storage Blob Data Contributor\"\n",
    "    az synapse role assignment create --workspace-name $synapse_workspace_name --role \"Synapse Contributor\" --assignee $userId\n",
    "```\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "1ec709d2-62ef-48c7-b915-9790afdac589",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "### Install Feathr python package and it's dependencies\n",
    "\n",
    "Here, we install the package from the repository's main branch. To use the latest release, you may run `pip install \"feathr[notebook]\"` instead. If so, however, some of the new features in this notebook might not work."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Uncomment and run this cell to install feathr from the latest codes in the repo. You may use `pip install \"feathr[notebook]\"` as well.\n",
    "#%pip install \"git+https://github.com/feathr-ai/feathr.git#subdirectory=feathr_project&egg=feathr[notebook]\"  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We also install InterpretML package for model explainability\n",
    "%pip install interpret"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Config Feathr Client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "0f3135eb-15c5-4f46-90ff-881a21cc59df",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "import glob\n",
    "import os\n",
    "import tempfile\n",
    "from datetime import datetime, timedelta\n",
    "from math import sqrt\n",
    "\n",
    "import pandas as pd\n",
    "from pyspark.sql import DataFrame\n",
    "\n",
    "\n",
    "import feathr\n",
    "from feathr import (\n",
    "    FeathrClient,\n",
    "    BOOLEAN, FLOAT, INT32, ValueType,\n",
    "    Feature, DerivedFeature, FeatureAnchor,\n",
    "    BackfillTime, MaterializationSettings,\n",
    "    FeatureQuery, ObservationSettings,\n",
    "    RedisSink,\n",
    "    INPUT_CONTEXT, HdfsSource,\n",
    "    WindowAggTransformation,\n",
    "    TypedKey,\n",
    ")\n",
    "from feathr.datasets.constants import (\n",
    "    PRODUCT_RECOMMENDATION_USER_OBSERVATION_URL,\n",
    "    PRODUCT_RECOMMENDATION_USER_PROFILE_URL,\n",
    "    PRODUCT_RECOMMENDATION_USER_PURCHASE_HISTORY_URL,\n",
    "    PRODUCT_RECOMMENDATION_PRODUCT_DETAIL_URL,\n",
    ")\n",
    "from feathr.datasets.utils import maybe_download\n",
    "from feathr.utils.config import generate_config\n",
    "from feathr.utils.job_utils import get_result_df\n",
    "from feathr.utils.platform import is_databricks\n",
    "\n",
    "\n",
    "print(f\"Feathr version: {feathr.__version__}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "parameters"
    ]
   },
   "outputs": [],
   "source": [
    "RESOURCE_PREFIX = \"\"  # TODO fill the value used to deploy the resources via ARM template\n",
    "PROJECT_NAME = \"product_recommendation\"\n",
    "\n",
    "# Currently support: 'azure_synapse', 'databricks', and 'local' \n",
    "SPARK_CLUSTER = \"local\"\n",
    "\n",
    "# TODO fill values to use databricks cluster:\n",
    "DATABRICKS_CLUSTER_ID = None             # Set Databricks cluster id to use an existing cluster\n",
    "if is_databricks():\n",
    "    # If this notebook is running on Databricks, its context can be used to retrieve token and instance URL\n",
    "    ctx = dbutils.notebook.entry_point.getDbutils().notebook().getContext()\n",
    "    DATABRICKS_WORKSPACE_TOKEN_VALUE = ctx.apiToken().get()\n",
    "    SPARK_CONFIG__DATABRICKS__WORKSPACE_INSTANCE_URL = f\"https://{ctx.tags().get('browserHostName').get()}\"\n",
    "else:\n",
    "    DATABRICKS_WORKSPACE_TOKEN_VALUE = None                  # Set Databricks workspace token to use databricks\n",
    "    SPARK_CONFIG__DATABRICKS__WORKSPACE_INSTANCE_URL = None  # Set Databricks workspace url to use databricks\n",
    "\n",
    "# TODO fill values to use Azure Synapse cluster:\n",
    "AZURE_SYNAPSE_SPARK_POOL = None  # Set Azure Synapse Spark pool name\n",
    "ADLS_KEY = None                  # Set Azure Data Lake Storage key to use Azure Synapse\n",
    "\n",
    "# TODO if you deployed resources manually using different names, you'll need to change the following values accordingly: \n",
    "ADLS_ACCOUNT = f\"{RESOURCE_PREFIX}dls\"\n",
    "ADLS_FS_NAME = f\"{RESOURCE_PREFIX}fs\"\n",
    "AZURE_SYNAPSE_URL = f\"https://{RESOURCE_PREFIX}syws.dev.azuresynapse.net\"  # Set Azure Synapse workspace url to use Azure Synapse\n",
    "KEY_VAULT_URI = f\"https://{RESOURCE_PREFIX}kv.vault.azure.net\"\n",
    "REDIS_HOST = f\"{RESOURCE_PREFIX}redis.redis.cache.windows.net\"\n",
    "REGISTRY_ENDPOINT = f\"https://{RESOURCE_PREFIX}webapp.azurewebsites.net/api/v1\"\n",
    "AZURE_SYNAPSE_WORKING_DIR = f\"abfss://{ADLS_FS_NAME}@{ADLS_ACCOUNT}.dfs.core.windows.net/{PROJECT_NAME}\"\n",
    "\n",
    "# An existing Feathr config file path. If None, we'll generate a new config based on the constants in this cell.\n",
    "FEATHR_CONFIG_PATH = None\n",
    "\n",
    "USE_CLI_AUTH = False  # Set to True to use CLI authentication\n",
    "\n",
    "# If set True, register the features to Feathr registry.\n",
    "REGISTER_FEATURES = False\n",
    "\n",
    "# (For the notebook test pipeline) If true, use ScrapBook package to collect the results.\n",
    "SCRAP_RESULTS = False"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get Azure credential to access Azure resources"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get an authentication credential to access Azure resources and register features\n",
    "if USE_CLI_AUTH:\n",
    "    # Use AZ CLI interactive browser authentication\n",
    "    !az login --use-device-code\n",
    "    from azure.identity import AzureCliCredential\n",
    "    credential = AzureCliCredential(additionally_allowed_tenants=['*'],)\n",
    "elif \"AZURE_TENANT_ID\" in os.environ and \"AZURE_CLIENT_ID\" in os.environ and \"AZURE_CLIENT_SECRET\" in os.environ:\n",
    "    # Use Environment variable secret\n",
    "    from azure.identity import EnvironmentCredential\n",
    "    credential = EnvironmentCredential()\n",
    "else:\n",
    "    # Try to use the default credential\n",
    "    from azure.identity import DefaultAzureCredential\n",
    "    credential = DefaultAzureCredential(\n",
    "        exclude_interactive_browser_credential=False,\n",
    "        additionally_allowed_tenants=['*'],\n",
    "    )"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set the environment variables\n",
    "Set the environment variables that will be used by Feathr as configuration. Feathr supports configuration via enviroment variables and yaml. You can read more about it [here](https://feathr-ai.github.io/feathr/how-to-guides/feathr-configuration-and-env.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if SPARK_CLUSTER == \"azure_synapse\" and not os.environ.get(\"ADLS_KEY\"):\n",
    "    os.environ[\"ADLS_KEY\"] = ADLS_KEY\n",
    "elif SPARK_CLUSTER == \"databricks\" and not os.environ.get(\"DATABRICKS_WORKSPACE_TOKEN_VALUE\"):\n",
    "    os.environ[\"DATABRICKS_WORKSPACE_TOKEN_VALUE\"] = DATABRICKS_WORKSPACE_TOKEN_VALUE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Redis password\n",
    "if 'REDIS_PASSWORD' not in os.environ:\n",
    "    from azure.keyvault.secrets import SecretClient\n",
    "    \n",
    "    secret_client = SecretClient(vault_url=KEY_VAULT_URI, credential=credential)\n",
    "    retrieved_secret = secret_client.get_secret('FEATHR-ONLINE-STORE-CONN').value\n",
    "    os.environ['REDIS_PASSWORD'] = retrieved_secret.split(\",\")[1].split(\"password=\", 1)[1]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> If you run into issues where Key vault or other resources are not found through notebook despite being there, make sure you are connected to the right subscription by running the command: 'az account show' and 'az account set --subscription <subscription_id>'"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "4a1f37e9-eb40-4791-9904-19e13a98f5c9",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "### Generate the Feathr client configuration\n",
    "\n",
    "The code below will write the onfiguration to a temporary location that will be used by a Feathr client. Please refer to [feathr_config.yaml](https://github.com/feathr-ai/feathr/blob/main/feathr_project/feathrcli/data/feathr_user_workspace/feathr_config.yaml) for full list of configuration options and details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "c7cd2bc7-237c-4170-a9b7-ae94f279bbba",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "if FEATHR_CONFIG_PATH:\n",
    "    config_path = FEATHR_CONFIG_PATH\n",
    "else:\n",
    "    config_path = generate_config(\n",
    "        resource_prefix=RESOURCE_PREFIX,\n",
    "        project_name=PROJECT_NAME,\n",
    "        online_store__redis__host=REDIS_HOST,\n",
    "        feature_registry__api_endpoint=REGISTRY_ENDPOINT,\n",
    "        spark_config__spark_cluster=SPARK_CLUSTER,\n",
    "        spark_config__azure_synapse__dev_url=AZURE_SYNAPSE_URL,\n",
    "        spark_config__azure_synapse__pool_name=AZURE_SYNAPSE_SPARK_POOL,\n",
    "        spark_config__azure_synapse__workspace_dir=AZURE_SYNAPSE_WORKING_DIR,\n",
    "        spark_config__databricks__workspace_instance_url=SPARK_CONFIG__DATABRICKS__WORKSPACE_INSTANCE_URL,\n",
    "        databricks_cluster_id=DATABRICKS_CLUSTER_ID,\n",
    "    )\n",
    "\n",
    "with open(config_path, 'r') as f: \n",
    "    print(f.read())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "794492ed-66b0-4787-adc6-3f234c4739a9",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "### Initialize Feathr Client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "0c748f9d-210b-4c1d-a414-b30328d5e219",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "client = FeathrClient(config_path=config_path, credential=credential)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Prepare Datasets\n",
    "\n",
    "1. Download datasets\n",
    "2. Upload to cloud storage if necessary so that the target cluster can consume them as the data sources"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use dbfs if the notebook is running on Databricks\n",
    "if is_databricks():\n",
    "    WORKING_DIR = f\"/dbfs/{PROJECT_NAME}\"\n",
    "else:\n",
    "    WORKING_DIR = PROJECT_NAME"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Download datasets\n",
    "user_observation_file_path = f\"{WORKING_DIR}/user_observation.csv\"\n",
    "user_profile_file_path = f\"{WORKING_DIR}/user_profile.csv\"\n",
    "user_purchase_history_file_path = f\"{WORKING_DIR}/user_purchase_history.csv\"\n",
    "product_detail_file_path = f\"{WORKING_DIR}/product_detail.csv\"\n",
    "maybe_download(\n",
    "    src_url=PRODUCT_RECOMMENDATION_USER_OBSERVATION_URL,\n",
    "    dst_filepath=user_observation_file_path,\n",
    ")\n",
    "maybe_download(\n",
    "    src_url=PRODUCT_RECOMMENDATION_USER_PROFILE_URL,\n",
    "    dst_filepath=user_profile_file_path,\n",
    ")\n",
    "maybe_download(\n",
    "    src_url=PRODUCT_RECOMMENDATION_USER_PURCHASE_HISTORY_URL,\n",
    "    dst_filepath=user_purchase_history_file_path,\n",
    ")\n",
    "maybe_download(\n",
    "    src_url=PRODUCT_RECOMMENDATION_PRODUCT_DETAIL_URL,\n",
    "    dst_filepath=product_detail_file_path,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Upload files to cloud if needed\n",
    "if client.spark_runtime == \"local\":\n",
    "    # In local mode, we can use the same data path as the source.\n",
    "    # If the notebook is running on databricks, DATA_FILE_PATH should be already a dbfs path.\n",
    "    user_observation_source_path = user_observation_file_path\n",
    "    user_profile_source_path = user_profile_file_path\n",
    "    user_purchase_history_source_path = user_purchase_history_file_path\n",
    "    product_detail_source_path = product_detail_file_path\n",
    "elif client.spark_runtime == \"databricks\" and is_databricks():\n",
    "    # If the notebook is running on databricks, we can use the same data path as the source.\n",
    "    user_observation_source_path = user_observation_file_path.replace(\"/dbfs\", \"dbfs:\")\n",
    "    user_profile_source_path = user_profile_file_path.replace(\"/dbfs\", \"dbfs:\")\n",
    "    user_purchase_history_source_path = user_purchase_history_file_path.replace(\"/dbfs\", \"dbfs:\")\n",
    "    product_detail_source_path = product_detail_file_path.replace(\"/dbfs\", \"dbfs:\")\n",
    "else:\n",
    "    # Otherwise, upload the local file to the cloud storage (either dbfs or adls).\n",
    "    user_observation_source_path = client.feathr_spark_launcher.upload_or_get_cloud_path(user_observation_file_path)\n",
    "    user_profile_source_path = client.feathr_spark_launcher.upload_or_get_cloud_path(user_profile_file_path)\n",
    "    user_purchase_history_source_path = client.feathr_spark_launcher.upload_or_get_cloud_path(user_purchase_history_file_path)\n",
    "    product_detail_source_path = client.feathr_spark_launcher.upload_or_get_cloud_path(product_detail_file_path)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "46b45998-d933-4417-b152-7db091c0d5bd",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "## 3. Define Sharable Features using Feathr API\n",
    "\n",
    "### Understand raw datasets\n",
    "We have three datasets to work with:\n",
    "* Observation dataset (a.k.a. labeled dataset)\n",
    "* User profile\n",
    "* User purchase history\n",
    "* Product details"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "591b1801-5783-4d88-b7b7-ff3bbcfa0a9e",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "# Observation dataset\n",
    "# Observation dataset usually comes with a event_timestamp to denote when the observation happened.\n",
    "# The label here is product_rating. Our model objective is to predict a user's rating for this product.\n",
    "pd.read_csv(user_observation_file_path).head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "11b8a74f-c0e1-4556-9a97-f17f8a90a795",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "# User profile dataset\n",
    "# Used to generate user features\n",
    "pd.read_csv(user_profile_file_path).head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "12f237da-a7fb-48c2-985e-a8cdfa3bb3fc",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "# User purchase history dataset.\n",
    "# Used to generate user features. This is activity type data, so we need to use aggregation to genearte features.\n",
    "pd.read_csv(user_purchase_history_file_path).head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "333ef001-50c8-4556-b484-78715b657dbb",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "# Product detail dataset.\n",
    "# Used to generate product features.\n",
    "pd.read_csv(product_detail_file_path).head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "bdc5a2e1-ccd4-4d61-9168-b0e4f571587b",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "### What's a feature in Feathr\n",
    "A feature is an individual measurable property or characteristic of a phenomenon which is sometimes time-sensitive. \n",
    "\n",
    "In Feathr, a feature is defined by the following characteristics:\n",
    "* The typed key (a.k.a. entity id): identifies the subject of feature, e.g. a user id of 123, a product id of SKU234456.\n",
    "* The feature name: the unique identifier of the feature, e.g. user_age, total_spending_in_30_days.\n",
    "* The feature value: the actual value of that aspect at a particular time, e.g. the feature value of the person's age is 30 at year 2022.\n",
    "* The timestamp: this indicates when the event happened. For example, the user purchased certain product on a certain timestamp. This is usually used for point-in-time join.\n",
    "\n",
    "You can feel that this is defined from a feature consumer (a person who wants to use a feature) perspective. It only tells us what a feature is like. In later sections, you can see how a feature consumer can access the features in a very simple way.\n",
    "\n",
    "To define how to produce the feature, we need to specify:\n",
    "* Feature source: what source data that this feature is based on\n",
    "* Transformation: what transformation is used to transform the source data into feature. Transformation can be optional when you just want to take a column out from the source data.\n",
    "\n",
    "(For more details on feature definition, please refer to the [Feathr Feature Definition Guide](https://feathr-ai.github.io/feathr/concepts/feature-definition.html).)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "30e2c57d-6487-4d72-bd78-80d17325f1a9",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "Note: in some cases, such as features defined on top of request data, may have no entity key or timestamp.\n",
    "It is merely a function/transformation executing against request data at runtime.\n",
    "For example, the day of week of the request, which is calculated by converting the request UNIX timestamp.\n",
    "(We won't cover this in the tutorial.)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "64fc4ef8-ccde-4724-8eff-1263c08de39f",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "### Define Sources Section with UDFs\n",
    "\n",
    "A feature is called an anchored feature when the feature is directly extracted from the source data, rather than computed on top of other features. The latter case is called derived feature.\n",
    "\n",
    "A [feature source](https://feathr.readthedocs.io/en/latest/#feathr.Source) is needed for anchored features that describes the raw data in which the feature values are computed from. See the python documentation to get the details on each input column.\n",
    "\n",
    "See [the python API documentation](https://feathr.readthedocs.io/en/latest/#feathr.HdfsSource) to get the details of each input fields. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "c32249b5-599b-4337-bebf-c33693354685",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "def feathr_udf_preprocessing(df: DataFrame) -> DataFrame:\n",
    "    from pyspark.sql.functions import col\n",
    "\n",
    "    return df.withColumn(\"tax_rate_decimal\", col(\"tax_rate\") / 100)\n",
    "\n",
    "\n",
    "batch_source = HdfsSource(\n",
    "    name=\"userProfileData\",\n",
    "    path=user_profile_source_path,\n",
    "    preprocessing=feathr_udf_preprocessing,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "2961afe9-4bdc-48ba-a63f-229081f557a3",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "# Let's define some features for users so our recommendation can be customized for users.\n",
    "user_id = TypedKey(\n",
    "    key_column=\"user_id\",\n",
    "    key_column_type=ValueType.INT32,\n",
    "    description=\"user id\",\n",
    "    full_name=\"product_recommendation.user_id\",\n",
    ")\n",
    "\n",
    "feature_user_age = Feature(\n",
    "    name=\"feature_user_age\",\n",
    "    key=user_id,\n",
    "    feature_type=INT32,\n",
    "    transform=\"age\",\n",
    ")\n",
    "feature_user_tax_rate = Feature(\n",
    "    name=\"feature_user_tax_rate\",\n",
    "    key=user_id,\n",
    "    feature_type=FLOAT,\n",
    "    transform=\"tax_rate_decimal\",\n",
    ")\n",
    "feature_user_gift_card_balance = Feature(\n",
    "    name=\"feature_user_gift_card_balance\",\n",
    "    key=user_id,\n",
    "    feature_type=FLOAT,\n",
    "    transform=\"gift_card_balance\",\n",
    ")\n",
    "feature_user_has_valid_credit_card = Feature(\n",
    "    name=\"feature_user_has_valid_credit_card\",\n",
    "    key=user_id,\n",
    "    feature_type=BOOLEAN,\n",
    "    transform=\"number_of_credit_cards > 0\",\n",
    ")\n",
    "\n",
    "features = [\n",
    "    feature_user_age,\n",
    "    feature_user_tax_rate,\n",
    "    feature_user_gift_card_balance,\n",
    "    feature_user_has_valid_credit_card,\n",
    "]\n",
    "\n",
    "user_feature_anchor = FeatureAnchor(\n",
    "    name=\"anchored_features\", source=batch_source, features=features\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "4da453e8-a8fd-40b8-a1e6-2a0e7cac3f6e",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "# Let's define some features for the products so our recommendation can be customized for proudcts.\n",
    "product_batch_source = HdfsSource(\n",
    "    name=\"productProfileData\",\n",
    "    path=product_detail_source_path,\n",
    ")\n",
    "\n",
    "product_id = TypedKey(\n",
    "    key_column=\"product_id\",\n",
    "    key_column_type=ValueType.INT32,\n",
    "    description=\"product id\",\n",
    "    full_name=\"product_recommendation.product_id\",\n",
    ")\n",
    "\n",
    "feature_product_quantity = Feature(\n",
    "    name=\"feature_product_quantity\",\n",
    "    key=product_id,\n",
    "    feature_type=FLOAT,\n",
    "    transform=\"quantity\",\n",
    ")\n",
    "feature_product_price = Feature(\n",
    "    name=\"feature_product_price\", key=product_id, feature_type=FLOAT, transform=\"price\"\n",
    ")\n",
    "\n",
    "product_features = [feature_product_quantity, feature_product_price]\n",
    "\n",
    "product_feature_anchor = FeatureAnchor(\n",
    "    name=\"product_anchored_features\",\n",
    "    source=product_batch_source,\n",
    "    features=product_features,\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "78e240b4-dcab-499f-b6ed-72a14bfab968",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "### Define window aggregation features\n",
    "\n",
    "[Window aggregation](https://en.wikipedia.org/wiki/Window_function_%28SQL%29) helps us to create more powerful features by compressing large amount of information. For example, we can compute *average purchase amount over the last 90 days* from the purchase history to capture user's recent consumption trend.\n",
    "\n",
    "To create window aggregation features, we define `WindowAggTransformation` with following arguments:\n",
    "1. `agg_expr`: the field/column you want to aggregate. It can be an ANSI SQL expression, e.g. `cast_float(purchase_amount)` to cast `str` type values to `float`.\n",
    "2. `agg_func`: the aggregation function, e.g. `AVG`. See below table for the full list of supported functions.\n",
    "3. `window`: the aggregation window size, e.g. `90d` to aggregate over the 90 days.\n",
    "\n",
    "| Aggregation Type | Input Type | Description |\n",
    "| --- | --- | --- |\n",
    "| `SUM`, `COUNT`, `MAX`, `MIN`, `AVG` | Numeric | Applies the the numerical operation on the numeric inputs. |\n",
    "| `MAX_POOLING`, `MIN_POOLING`, `AVG_POOLING`\t| Numeric Vector | Applies the max/min/avg operation on a per entry basis for a given a collection of numbers. |\n",
    "| `LATEST` | Any | Returns the latest not-null values from within the defined time window. |\n",
    "\n",
    "After you have defined features and sources, bring them together to build an anchor:\n",
    "\n",
    "> Note that if the features comes directly from the observation data, the `source` argument should be `INPUT_CONTEXT` to indicate the source of the anchor is the observation data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "b62a9041-73dc-45e1-add5-8fe01ebf355f",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "purchase_history_data = HdfsSource(\n",
    "    name=\"purchase_history_data\",\n",
    "    path=user_purchase_history_source_path,\n",
    "    event_timestamp_column=\"purchase_date\",\n",
    "    timestamp_format=\"yyyy-MM-dd\",\n",
    ")\n",
    "\n",
    "agg_features = [\n",
    "    Feature(\n",
    "        name=\"feature_user_avg_purchase_for_90days\",\n",
    "        key=user_id,\n",
    "        feature_type=FLOAT,\n",
    "        transform=WindowAggTransformation(\n",
    "            agg_expr=\"cast_float(purchase_amount)\", agg_func=\"AVG\", window=\"90d\"\n",
    "        ),\n",
    "    )\n",
    "]\n",
    "\n",
    "user_agg_feature_anchor = FeatureAnchor(\n",
    "    name=\"aggregationFeatures\", source=purchase_history_data, features=agg_features\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "a04373b5-8ab9-4c36-892f-6aa8129df999",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "### Derived Features Section\n",
    "Derived features are the features that are computed from other features. They could be computed from anchored features or other derived features."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "688a4562-d8e9-468a-a900-77e750a3c903",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "derived_features = [\n",
    "    DerivedFeature(\n",
    "        name=\"feature_user_purchasing_power\",\n",
    "        key=user_id,\n",
    "        feature_type=FLOAT,\n",
    "        input_features=[feature_user_gift_card_balance, feature_user_has_valid_credit_card],\n",
    "        transform=\"feature_user_gift_card_balance + if(boolean(feature_user_has_valid_credit_card), 100, 0)\",\n",
    "    )\n",
    "]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "f4d8f829-bfbc-4d6f-bc32-3a419a32e3d3",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "### Build features\n",
    "\n",
    "Lastly, we need to build those features so that it can be consumed later. Note that we have to build both the \"anchor\" and the \"derived\" features which is not anchored to a source."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "4c617bb8-2605-4d40-acc9-2156c86dfc56",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "client.build_features(\n",
    "    anchor_list=[user_agg_feature_anchor, user_feature_anchor, product_feature_anchor],\n",
    "    derived_feature_list=derived_features,\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "6b2877d0-2ab8-4c07-99d4-effc7336ee8a",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "## 4. Create Training Data using Point-in-Time Correct Feature join\n",
    "\n",
    "To create a training dataset using Feathr, we need to provide a **feature join settings** to specify what features and how these features should be joined to the observation data. \n",
    "\n",
    "Also note that since a `FeatureQuery` accepts features of the same join key, we define two query objects, one for `user_id` key and the other one for `product_id` and pass them together to compute offline features. \n",
    "\n",
    "To learn more on this topic, please refer to [Point-in-time Correctness document](https://feathr-ai.github.io/feathr/concepts/point-in-time-join.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "30302a53-561f-4b85-ba25-8de9fc843c63",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "user_feature_query = FeatureQuery(\n",
    "    feature_list=[feat.name for feat in features + agg_features + derived_features],\n",
    "    key=user_id,\n",
    ")\n",
    "\n",
    "product_feature_query = FeatureQuery(\n",
    "    feature_list=[feat.name for feat in product_features],\n",
    "    key=product_id,\n",
    ")\n",
    "\n",
    "settings = ObservationSettings(\n",
    "    observation_path=user_observation_source_path,\n",
    "    event_timestamp_column=\"event_timestamp\",\n",
    "    timestamp_format=\"yyyy-MM-dd\",\n",
    ")\n",
    "client.get_offline_features(\n",
    "    observation_settings=settings,\n",
    "    feature_query=[user_feature_query, product_feature_query],\n",
    "    output_path=user_profile_source_path.rpartition(\"/\")[0] + f\"/product_recommendation_features.avro\",\n",
    ")\n",
    "client.wait_job_to_finish(timeout_sec=5000)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "cc7b6276-70c1-494f-83ca-53d442e3198a",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "Let's use the helper function `get_result_df` to download the result and view it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "120c9a21-1e1d-4ef5-8fe9-00d35a93cbf1",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "res_df = get_result_df(client)\n",
    "res_df.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "497d6a3b-94e2-4087-94b1-0a5d7baf3ab3",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "### Train a machine learning model\n",
    "After getting all the features, let's train a machine learning model with the converted feature by Feathr. Here, we use **EBM (Explainable Boosting Machine)** regressor from [InterpretML](https://github.com/interpretml/interpret) package to visualize the modeling results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "9bd661ae-430e-449b-9a62-9155828de099",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "from interpret import show\n",
    "from interpret.glassbox import ExplainableBoostingRegressor\n",
    "from sklearn.metrics import mean_squared_error\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "\n",
    "# Fill None values with 0\n",
    "final_df = (\n",
    "    res_df\n",
    "    .drop([\"event_timestamp\"], axis=1, errors=\"ignore\")\n",
    "    .fillna(0)\n",
    ")\n",
    "\n",
    "# Split data into train and test\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "    final_df.drop([\"product_rating\"], axis=1),\n",
    "    final_df[\"product_rating\"].astype(\"float64\"),\n",
    "    test_size=0.2,\n",
    "    random_state=42,\n",
    ")\n",
    "\n",
    "ebm = ExplainableBoostingRegressor()\n",
    "ebm.fit(X_train, y_train)\n",
    "\n",
    "# Note, currently InterpretML's visualization dashboard doesn't work w/ VSCODE notebook viewer\n",
    "# https://github.com/interpretml/interpret/issues/317\n",
    "ebm_global = ebm.explain_global()\n",
    "show(ebm_global)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Predict and evaluate\n",
    "y_pred = ebm.predict(X_test)\n",
    "rmse = sqrt(mean_squared_error(y_test.values.flatten(), y_pred))\n",
    "\n",
    "print(f\"Root mean squared error: {rmse}\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "fda62a21-e7d6-4044-879f-bc05f77d248e",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "## 5. Feature Materialization\n",
    "\n",
    "While Feathr can compute the feature value from the feature definition on-the-fly at request time, it can also pre-compute\n",
    "and materialize the feature value to offline and/or online storage. \n",
    "\n",
    "We can push the generated features to the online store like below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "3375f18d-cb64-4f13-8789-07b9d9c5835e",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "# Materialize user features\n",
    "# Note, you can only materialize features of same entity key into one table.\n",
    "redisSink = RedisSink(table_name=\"user_features\")\n",
    "settings = MaterializationSettings(\n",
    "    name=\"user_feature_setting\",\n",
    "    sinks=[redisSink],\n",
    "    feature_names=[\"feature_user_age\", \"feature_user_gift_card_balance\"],\n",
    ")\n",
    "\n",
    "client.materialize_features(settings=settings, allow_materialize_non_agg_feature=True)\n",
    "client.wait_job_to_finish(timeout_sec=5000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "7fb61ed8-6db4-461c-bd86-a5ff268a7c3d",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "We can then get the features from the online store (Redis):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "9d8f3710-d2d4-463a-b452-99bd56bb3482",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "client.get_online_features(\n",
    "    \"user_features\", \"2\", [\"feature_user_age\", \"feature_user_gift_card_balance\"]\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "e8aa6e5f-5b2d-4778-bafa-5a3a45fdd3b5",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "client.multi_get_online_features(\n",
    "    \"user_features\", [\"1\", \"2\"], [\"feature_user_age\", \"feature_user_gift_card_balance\"]\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "b19b73c6-7b0e-4b22-8eb1-8afdc328df74",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "We can also materialize product features into a separate table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "7a28cc6f-06f7-4915-9f3e-0a057467b77b",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "# Materialize product features\n",
    "backfill_time = BackfillTime(\n",
    "    start=datetime(2020, 5, 20),\n",
    "    end=datetime(2020, 5, 20),\n",
    "    step=timedelta(days=1),\n",
    ")\n",
    "\n",
    "redisSink = RedisSink(table_name=\"product_features\")\n",
    "settings = MaterializationSettings(\n",
    "    \"product_feature_setting\",\n",
    "    backfill_time=backfill_time,\n",
    "    sinks=[redisSink],\n",
    "    feature_names=[\"feature_product_price\"],\n",
    ")\n",
    "\n",
    "client.materialize_features(settings, allow_materialize_non_agg_feature=True)\n",
    "client.wait_job_to_finish(timeout_sec=5000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "8732aad1-7b22-4efc-8e2c-722030ae8bfb",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "client.get_online_features(\"product_features\", \"2\", [\"feature_product_price\"])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "acd29f4d-715b-4889-954d-b648ea8e2a0f",
     "showTitle": false,
     "title": ""
    }
   },
   "source": [
    "## 6. Feature Registration\n",
    "Lastly, we can also register the features and share them across teams:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "application/vnd.databricks.v1+cell": {
     "inputWidgets": {},
     "nuid": "1255ed12-5030-43b6-b733-5a467874b708",
     "showTitle": false,
     "title": ""
    }
   },
   "outputs": [],
   "source": [
    "if REGISTER_FEATURES:\n",
    "    try:\n",
    "        client.register_features()\n",
    "    except Exception as e:\n",
    "        print(e)\n",
    "    print(client.list_registered_features(project_name=PROJECT_NAME))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cleanup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Cleaning up the output files. CAUTION: this maybe dangerous if you \"reused\" the project name.\n",
    "import shutil\n",
    "shutil.rmtree(WORKING_DIR, ignore_errors=False)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Scrap Variables for Unit-Test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if SCRAP_RESULTS:\n",
    "    # Record results for test pipelines\n",
    "    import scrapbook as sb\n",
    "    sb.glue(\n",
    "        \"user_features\",\n",
    "        client.get_online_features(\n",
    "            \"user_features\", \"2\", [\"feature_user_age\", \"feature_user_gift_card_balance\"]\n",
    "        ),\n",
    "    )\n",
    "    sb.glue(\n",
    "        \"product_features\",\n",
    "        client.get_online_features(\n",
    "            \"product_features\", \"2\", [\"feature_product_price\"]\n",
    "        ),\n",
    "    )\n",
    "    \n",
    "    sb.glue(\"rmse\", rmse)"
   ]
  }
 ],
 "metadata": {
  "application/vnd.databricks.v1+notebook": {
   "dashboards": [],
   "language": "python",
   "notebookMetadata": {
    "pythonIndentUnit": 4
   },
   "notebookName": "product_recommendation_demo_advanced",
   "notebookOrigID": 411375353096492,
   "widgets": {}
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5 (default, Jan 27 2021, 15:41:15) \n[GCC 9.3.0]"
  },
  "vscode": {
   "interpreter": {
    "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
